Skip to content

HDDS-15605. Fix flaky testContainerExclusionWithClosedContainerException#10621

Merged
adoroszlai merged 1 commit into
apache:masterfrom
chihsuan:HDDS-15605
Jul 5, 2026
Merged

HDDS-15605. Fix flaky testContainerExclusionWithClosedContainerException#10621
adoroszlai merged 1 commit into
apache:masterfrom
chihsuan:HDDS-15605

Conversation

@chihsuan

@chihsuan chihsuan commented Jun 27, 2026

Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

testContainerExclusionWithClosedContainerException intermittently fails at the datanode assertion (Expecting empty but was: [<uuid>(null/null)]).

The test asserts that after a ClosedContainerException only the closed container is excluded. But under the default ALL_COMMITTED watch level, a momentarily-slow follower whose watch-for-commit times out is recorded in the client exclude list — intended slow-node-avoidance behaviour of the configurable watchType (HDDS-2887). So an empty datanode set is not an invariant under ALL_COMMITTED; the assertion predates the watchType config and never accounted for it.

The test's subject is container exclusion, which is independent of the watch level. This removes the non-invariant getDatanodes().isEmpty() assertion (the container and pipeline assertions stay); watch-level datanode exclusion is already covered by testDatanodeExclusionWithMajorityCommit.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-15605

How was this patch tested?

intermittent-test-check on the fork, TestFailureHandlingByClient, test-name=ALL, 100 runs each (10 splits x 10 iterations):

  • This branch: the assertion did not recur (0/100). (run)
  • master baseline, same harness: the assertion reproduced in ~5/100 runs (Expecting empty but was: [...]), matching the reported intermittency. (run)
  • Remaining noise, unrelated to this change (present on master too):
    • A TimeoutException in TestHelper.waitForContainerClose (container state transition not completing under the harness's extreme parallel load of 10 concurrent splits); it occurs before the modified assertion and is a pre-existing load-sensitivity of the test.
    • testDatanodeExclusionWithMajorityCommit failures are the known HDDS-13972 (@Flaky).

@adoroszlai adoroszlai merged commit c7d83a7 into apache:master Jul 5, 2026
32 of 33 checks passed
@adoroszlai

Copy link
Copy Markdown
Contributor

Thanks @chihsuan for the patch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants